Apparatus and associated methods for presentation of audio

ABSTRACT

An apparatus comprising means configured to: receive audio content from a remote user device, the audio content comprising primary audio and secondary audio, the secondary audio being different to the primary audio and comprising ambient audio; receive secondary audio importance information associated with said audio content and indicative of an importance of the secondary audio; receive current audio presentation information indicative of at least whether audio from one or more audio sources is currently being presented as spatial audio such that respective audio of the one or more audio sources is to be perceived as originating from one or more respective directions or ranges of directions around the user; provide for presentation of the primary audio; and provide for presentation of the secondary audio based on the secondary audio importance information and the current audio presentation information.

TECHNICAL FIELD

The present disclosure relates to the field of presentation of audioand, in particular, to the presentation of audio content, such astelecommunication audio content or immersive telecommunication audiocontent. The present disclosure also relates to associated apparatus,methods and computer programs.

BACKGROUND

Telecommunication or telephony systems are being developed that providefor more than monophonic capture and presentation of audio. The audio ofsuch telephony may comprise spatial audio. The presentation of suchaudio may require careful consideration to ensure the telecommunicationis clear and effective.

The listing or discussion of a prior-published document or anybackground in this specification should not necessarily be taken as anacknowledgement that the document or background is part of the state ofthe art or is common general knowledge. One or more aspects/examples ofthe present disclosure may or may not address one or more of thebackground issues.

SUMMARY

In a first aspect of the disclosure there is provided an apparatuscomprising means configured to:

-   -   receive audio content from a remote user device, the audio        content comprising primary audio and secondary audio, the        secondary audio being different to the primary audio and        comprising ambient audio;    -   receive secondary audio importance information associated with        said audio content and indicative of an importance of the        secondary audio;    -   receive current audio presentation information indicative of at        least whether audio from one or more audio sources is currently        being presented as spatial audio such that respective audio of        the one or more audio sources is to be perceived as originating        from one or more respective directions or ranges of directions        around a reference point;    -   provide for presentation of the primary audio; and    -   provide for presentation of the secondary audio based on the        secondary audio importance information and the current audio        presentation information.

In one or more examples, the reference point is indicative of thelocation of a user to whom the audio content is presented. In one ormore examples, the primary audio comprises voice audio comprising audiodetermined to be generated by a voice of one or more remote users, suchas for telecommunication with the user, and the secondary audiocomprises ambient audio comprising audio other than that determined tobe generated by the voice of the one or more remote users. In one ormore examples, the primary audio comprises spatial audio that includesdirectional information such that, when presented, it is to be perceivedas originating from a direction or range of directions in accordancewith the directional information and the secondary audio comprises atleast one of audio without said directional information and spatialaudio with said directional information that defines a range ofdirections from which the audio should be perceived greater than athreshold range of directions. In one or more examples, the audiocontent comprises telecommunication audio content comprising audiocontent provided for the purpose of telecommunication, which may be viaa traditional telecommunication network or provided by a voice over IPor any other packet-based or circuit switched telephony service. In oneor more examples in which the primary audio and/or the secondary audiocomprises spatial audio, the primary audio may comprise one or moreaudio channels each associated with respective one or more audioobjects, the audio objects each having a defined location from which theassociated audio channel, when presented, is to be perceived. In one ormore examples, the locations associated with the primary audio arelocated such and/or have a width that is less than the threshold rangeof directions. In one or more examples, the locations associated withthe secondary audio are located such and/or have a width that is greaterthan a threshold range of directions.

In one or more examples, said secondary audio importance information isreceived from the remote user device with said audio content. In one ormore examples, said audio content is provided as part of a call betweenthe remote user and a user of the apparatus, and wherein the secondaryaudio importance information is set by the remote user or automaticallydetermined at least for each call. In one or more examples, saidsecondary audio importance information is set by the remote user via theremote user device and is based on said audio content. In one or moreexamples, said secondary audio importance information is determined byand received from a server that receives said audio content from theremote user device.

In one or more examples, the secondary audio importance informationdefines at least two levels of importance comprising important andunimportant and wherein the apparatus includes means configured to:

-   -   provide for presentation of the secondary audio based on the        secondary audio importance information being indicative of the        secondary audio being important, said presentation based on the        current audio presentation information; and at least one of:    -   provide for presentation of the secondary audio based on the        secondary audio importance information being indicative of the        secondary audio being unimportant and the current audio        presentation information being indicative that none of the one        or more audio sources are currently presenting audio; and    -   provide for non-presentation of the secondary audio based on the        secondary audio importance information being indicative of the        secondary audio being unimportant and the current audio        presentation information being indicative that at least one of        the one or more audio sources are currently presenting audio.

In one or more examples, said means are configured to:

-   -   provide for presentation of the secondary audio, based on the        secondary audio importance information being indicative of the        secondary audio being important and the current audio        presentation information being indicative of at least one of the        one or more audio sources currently presenting audio, and        provide for modification of one or both of a volume or the range        of directions with which the audio of at least one of the one or        more audio sources is presented to accommodate presentation of        the secondary audio.

In one or more examples, said means are configured to:

-   -   provide for presentation of the secondary audio such that it is        perceived as originating from all directions around the user,        based on the secondary audio importance information being        indicative of the secondary audio being important and the        current audio presentation information being indicative that        none of the one or more audio sources are currently presenting        audio.

In one or more examples, the means configured to provide forpresentation of the primary audio as spatial audio such that it is to beperceived as originating from a direction or range of directions that isnon-overlapping with a direction or range of directions associated withthe audio of the one or more audio sources based on the current audiopresentation information.

In one or more examples, the means are configured to:

-   -   receive default perceived location information which defines a        default perceived location for the audio content;    -   provide for presentation of at least said primary audio of said        audio content as spatial audio to be perceived as originating        from said default perceived location.

In one or more examples, the means may be configured, to provide forpresentation, if said default perceived location information is notreceived, of said primary audio of said audio content as spatial audioto be perceived from a direction or range of directions that isnon-overlapping with any audio that is presented by the one or moreaudio sources.

In one or more examples, said means are configured to:

-   -   capture user audio content of a user;    -   send said captured user audio content to the remote user device        to provide for telecommunication between the user and the remote        user of the remote user device, wherein said user audio content        comprises primary audio and secondary audio, the secondary audio        being different to the primary audio and comprising ambient        audio; and    -   send secondary audio importance information associated with said        user audio content and indicative of an importance of the        secondary audio of the user audio content for use by the remote        user device, the secondary audio importance information based on        user input received from the user.

In one or more examples, the primary audio comprises voice audiocomprising audio determined to be generated by a voice of at least theuser, such as for telecommunication with the remote user, and thesecondary audio comprises ambient audio comprising audio other than thatdetermined to be generated by the voice of the user. In one or moreexamples, the primary audio comprises spatial audio that includesdirectional information such that, when presented, it is to be perceivedas originating from a direction or range of directions in accordancewith the directional information and the secondary audio comprises atleast one of audio without said directional information and spatialaudio with said directional information that defines a range ofdirections from which the audio should be perceived greater than athreshold range of directions.

In one or more examples, the secondary audio importance information isbased on one or more of:

-   -   audio analysis of the user audio content; and    -   a determined current location of the user.

In one or more examples, the means are configured to, on determinationthat the secondary audio importance information associated with saidcaptured user audio content is indicative of the user audio contentbeing unimportant, modify the captured user audio content from beingcategorised as primary audio and secondary audio to one of monophonicand stereophonic audio prior to said sending of the captured user audiocontent or capture the user audio content as one of monophonic andstereophonic.

In one or more examples, said means are configured to senduser-monitoring information to the remote user device, theuser-monitoring information indicative of whether or not the secondaryaudio is being presented for at least notifying said remote user.

In one or more examples, said means are configured to senduser-monitoring information to the remote user device, theuser-monitoring information indicative of one or more of:

-   -   a presentation direction comprising a direction defined relative        to a reference direction comprising from where the audio content        is to be perceived when the apparatus is in use, the        presentation position for use in presenting the audio content to        the remote user by the remote user device;    -   audio of at least one of the one or more audio sources presented        to the user as defined in the current audio presentation        information for presenting to the remote user by the remote user        device; and    -   a reference to at least one of the one or more audio sources        presented to the user as defined in the current audio        presentation information from which the audio can be retrieved        for presenting said audio of the at least one of the one or more        audio sources to the remote user by the remote user device.

In one or more examples, said means are configured to:

-   -   receive remote-user-monitoring information from the remote user        device, and one or more of:    -   provide for presentation of the user audio content to be        perceived as originating from a position corresponding to a        presentation position, wherein the remote-user-monitoring        information from the remote user device comprises the        presentation position which is indicative of a position relative        to the remote user from where the user audio content is to be        perceived when presented by the remote user device; and    -   provide for presentation of the audio of at least one of one or        more audio sources currently being presented to the remote user,        wherein said remote-user-monitoring information received from        the remote user device comprises said audio or a reference to        the at least one of the one or more audio sources presented to        the remote user.

In a further aspect there is provided a method, the method comprising:

-   -   receiving audio content from a remote user device, the audio        content comprising primary audio and secondary audio, the        secondary audio being different to the primary audio and        comprising ambient audio;    -   receiving secondary audio importance information associated with        said audio content and indicative of an importance of the        secondary audio;    -   receiving current audio presentation information indicative of        at least whether audio from one or more audio sources is        currently being presented as spatial audio such that respective        audio of the one or more audio sources is to be perceived as        originating from one or more respective directions or ranges of        directions around a reference point;    -   providing for presentation of the primary audio; and    -   providing for presentation of the secondary audio based on the        secondary audio importance information and the current audio        presentation information.

In a further aspect there is provided a computer readable mediumcomprising computer program code stored thereon, the computer readablemedium and computer program code being configured to, when run on atleast one processor, perform the method of:

-   -   receiving audio content from a remote user device, the audio        content comprising primary audio and secondary audio, the        secondary audio being different to the primary audio and        comprising ambient audio;    -   receiving secondary audio importance information associated with        said audio content and indicative of an importance of the        secondary audio;    -   receiving current audio presentation information indicative of        at least whether audio from one or more audio sources is        currently being presented as spatial audio such that respective        audio of the one or more audio sources is to be perceived as        originating from one or more respective directions or ranges of        directions around a reference point;    -   providing for presentation of the primary audio; and    -   providing for presentation of the secondary audio based on the        secondary audio importance information and the current audio        presentation information.

In a further aspect there is provided an apparatus, the apparatuscomprising means configured to:

-   -   send user audio content, captured by a local user device, to a        remote user device for presentation to a remote user, said user        audio content comprising primary audio and secondary audio, the        secondary audio being different to the primary audio and        comprising ambient audio; and    -   send secondary audio importance information associated with said        user audio content and indicative of an importance of the        secondary audio.

In one or more examples, said user audio content comprising audio of atleast a user of the local user device.

In a further aspect there is provided a method, the method comprising:

-   -   sending user audio content, captured by a local user device, to        a remote user device for presentation to a remote user, said        user audio content comprising primary audio and secondary audio,        the secondary audio being different to the primary audio and        comprising ambient audio; and    -   sending secondary audio importance information associated with        said user audio content and indicative of an importance of the        secondary audio.

In a further aspect there is provided a computer readable mediumcomprising computer program code stored thereon, the computer readablemedium and computer program code being configured to, when run on atleast one processor, perform the method of:

-   -   sending user audio content, captured by a local user device, to        a remote user device for presentation to a remote user, said        user audio content comprising primary audio and secondary audio,        the secondary audio being different to the primary audio and        comprising ambient audio; and    -   sending secondary audio importance information associated with        said user audio content and indicative of an importance of the        secondary audio.

In a further example aspect there is provided an apparatus comprising:

-   -   at least one processor; and    -   at least one memory including computer program code,    -   the at least one memory and the computer program code configured        to, with the at least one processor, cause the apparatus to        perform at least the following:    -   receive audio content from a remote user device, the audio        content comprising primary audio and secondary audio, the        secondary audio being different to the primary audio and        comprising ambient audio;    -   receive secondary audio importance information associated with        said audio content and indicative of an importance of the        secondary audio;    -   receive current audio presentation information indicative of at        least whether audio from one or more audio sources is currently        being presented as spatial audio such that respective audio of        the one or more audio sources is to be perceived as originating        from one or more respective directions or ranges of directions        around a reference point;    -   provide for presentation of the primary audio; and    -   provide for presentation of the secondary audio based on the        secondary audio importance information and the current audio        presentation information.

In a further example aspect there is provided an apparatus comprising:

-   -   at least one processor; and    -   at least one memory including computer program code,    -   the at least one memory and the computer program code configured        to, with the at least one processor, cause the apparatus to        perform at least the following:    -   send user audio content, captured by a local user device, to a        remote user device for presentation to a remote user, said user        audio content comprising primary audio and secondary audio, the        secondary audio being different to the primary audio and        comprising ambient audio; and    -   send secondary audio importance information associated with said        user audio content and indicative of an importance of the        secondary audio.

The present disclosure includes one or more corresponding aspects,examples or features in isolation or in various combinations whether ornot specifically stated (including claimed) in that combination or inisolation. Corresponding means and corresponding functional units (e.g.,function enabler, speaker selector, amplifier, display device) forperforming one or more of the discussed functions are also within thepresent disclosure.

Corresponding computer programs for implementing one or more of themethods disclosed are also within the present disclosure and encompassedby one or more of the described examples.

The above summary is intended to be merely exemplary and non-limiting.

BRIEF DESCRIPTION OF THE FIGURES

A description is now given, by way of example only, with reference tothe accompanying drawings, in which:

FIG. 1 illustrates an example apparatus for providing for presentationof audio;

FIG. 2 illustrates telecommunication between a user and a remote user;

FIG. 3 illustrates an example block diagram of an immersive spatialaudio encoder;

FIG. 4 illustrates an example presentation space in which a user ispresented with audio;

FIG. 5 illustrates the same presentation space but overlaid with anexample spatial audio scene showing where audio is perceived tooriginate that is provided to the user under the control of theapparatus;

FIG. 6 illustrates a plan view of FIG. 5;

FIG. 7 illustrates the presentation of audio to the user and remote userto exemplify the use of the user-monitoring information;

FIG. 8 illustrates the actions of the apparatus and the remote apparatusin conducting the telecommunication call;

FIG. 9 shows a flowchart illustrating an example method; and

FIG. 10 shows a computer readable medium.

DESCRIPTION OF EXAMPLE ASPECTS

Telecommunication or telephony systems are being developed that providefor more than monophonic capture and monophonic presentation of audioImmersive telephony systems are being developed, such as by the 3rdGeneration Partnership Project (3GPP), that will enable a new level ofimmersion in telephony services. Immersive telephony may comprise theuse of spatial audio presentation techniques and the capture of spatialaudio content in the provision of telecommunication between users. Suchservice can be realized, e.g., over a mobile 4G or 5G network by amulti-microphone spatial audio capture and processing, encoding in asuitable immersive audio format, transmission and decoding, and binauralor loudspeaker presentation. Such systems may provide for transmissionof and presentation of immersive, spatial audio content, such asparametric spatial audio. This may enable receiving and sending of anenveloping audio scene from/to the telecommunication call participantsor users. Thus, for example, when a remote user calls to a user, theuser can experience the audio environment around the remote user as ifhe/she was physically located at the location of the remote user andvice versa.

In one or more examples however, the user may already be experiencingimmersive spatial audio content from one or more audio sources.Accordingly, it may require careful consideration to provide forpresentation of immersive audio telecommunication content in combinationwith a pre-existing spatial audio scene comprising audio from one ormore audio sources presented as spatial audio such that said audio is tobe perceived from one or more respective directions. For example,complications relating to auditory masking in time, frequency and/orspatial accuracy may occur for at least one of the one or more audiosources during a simultaneous presentation.

In one or more examples, the audio content provided as part of saidtelecommunication may be categorised as primary audio and secondaryaudio. The primary audio may comprise the important audio forunderstanding the telecommunication call while the secondary audio maycomprise ambient audio. Ambient audio may be considered the backgroundaudio of the audio content. It will be appreciated that while theexamples herein relate to audio content in the field oftelecommunication, the principles may be applied to other fields ofaudio content presentation.

In one or more examples, the primary audio may comprise voice audiocomprising audio determined to be generated by a voice of one or moreremote users in telecommunication with a user (who may be referred to asa local user). The “voice” primary audio may be categorised at the pointof capture or at the point of play back using audio analysis techniques,or by a server or any other entity involved in said telecommunication.The secondary audio may, in one or more examples, comprise ambient audiocomprising audio other than that determined to be generated by the voiceof one or more remote users. Thus, in one or more examples, a firstmicrophone configured to detect the user's voice may provide the primaryaudio and one or more other microphones configured to detect audio fromelsewhere may provide the secondary audio. It will be appreciated thatwith multi-microphone arrangements the same audio may be detected bymore than one microphone and therefore audio processing techniques maybe used to separate the voice audio detected primarily by the firstmicrophone from the audio detected by the other microphones. Forexample, if a mobile telephone is used to capture the audio, amicrophone near the user's mouth may be configured to capture,primarily, the voice audio and a microphone(s) on the back of the mobiletelephone may be configured to capture the ambient audio. In one or moreexamples, a single microphone may be used and audio processingalgorithms may be used to separate the voice audio from any ambientnoise to provide for audio content categorized as primary audio andsecondary audio, wherein such algorithms are known to those skilled inthe art. In further examples, the voice audio may be captured using aclose-up microphone or microphones, while the ambience may be capturedusing a microphone array (such as an Ambisonics microphone) which mayhave a fixed position in the scene.

In one or more examples, the primary audio may comprise spatial audiocontent that includes directional information such that, when presented,it is to be perceived as originating from one or more directions inaccordance with the directional information. In one or more examples,the primary audio may not include directional information. In one ormore examples, the secondary audio may comprise ambient audio comprisingaudio without said directional information or without a direction ofarrival distinguishable above a threshold level. In one or moreexamples, the ambient audio comprises spatial audio, but may not haveclear directions (e.g. above the threshold level of directionality) thatthe user can perceive or that can be determined from the audio that wascaptured. Thus, in one or more examples, the secondary audio may includeat least some directional information too. However, the transmittingend, or any other element, may consider these directional soundcomponents to not provide information that is particularly relevant orperceptually important for the communication. Thus, in one or moreexamples, a classification or content analysis may be performed todetermine which audio should be classified as primary and which assecondary and, optionally, whether or not directional information shouldbe associated with the audio.

It will be appreciated that in one or more examples, the primary audiomay be important for understanding a telecommunication call while thesecondary, ambient, audio, may be considered to be the background audioat the location of the remote user and therefore only “possibly”important to the telecommunication call. The secondary audio, bydefault, may be configured for presentation such that it is heard from awide range or all directions or such that it is to be perceived from nospecific direction or location. Thus, the secondary audio may beconfigured, by default, to be provided for replicating the ambient audioenvironment of the remote user to the user or vice versa. If the audiocontent is an audiobook, the primary audio may comprise the reader ofthe audiobook, while the secondary audio may comprise background soundsprovided to supplement the story. In one or more examples, the secondaryaudio may be selectively presented, or its presentation may be modifiedfrom its default presentation based on one or both of its importance andaudio that is already being presented to the user on receipt of thetelecommunication call.

With primary audio that comprises spatial audio content, the directionfrom which audio was received at the location of the remote user may bereproduced when presenting the audio to the first user (or any otheruser) by use of spatial audio presentation. In one or more examples, theprimary audio may be converted to monophonic audio (such as from spatialaudio content) and presented using spatial audio presentation such thatit is perceived from a desired direction or location.

Spatial audio comprises audio presented in such a way to a user that itis perceived to originate from a particular location or direction, as ifthe source of the audio was located at that particular location ordirection. Spatial audio content comprises audio for presentation asspatial audio and, as such, typically comprises audio having directionalinformation (either explicitly specified as, for example, metadata orinherently present in the way the audio is captured), such that thespatial audio content can be presented such that its component audio isperceived to originate from one or more points or one or more directionsin accordance with the directional information. One way to encode anddeliver spatial audio for an immersive audio telecommunication call isto encode the user's voice and the spatial ambience separately. Variousencoding formats exist including, e.g., near-far stereo, First OrderAmbisonics (FOA)/Higher Order Ambisonics (HOA) (+objects), and otherspatial audio encoding schemes. The provision of the primary andsecondary audio, in one or more examples, may be provided by theabove-mentioned encoding schemes.

In one or more examples, non-spatial audio content may be presented asspatial audio. Thus, “conventional” monophonic or stereophonic audio (oraudio converted to such a format) may be provided for presentation suchthat it will be perceived to originate from a particular location ordirection. One or more of the embodiments described herein may presentspatial audio based on spatial audio content or non-spatial audiocontent.

The spatial positioning of the spatial audio may be provided by 3D audioeffects, such as those that utilise a head related transfer function tocreate a spatial audio space (aligned with a real-world space in thecase of augmented reality) in which audio can be positioned forpresentation to a user. Spatial audio may be presented by headphones byusing head-related-transfer-function (HRTF) filtering techniques or, forloudspeakers, by using vector-base-amplitude panning techniques toposition the perceived aural origin of the audio content. In otherembodiments ambisonics audio presentation may be used to present spatialaudio.

Spatial audio may use one or more of volume differences, timingdifferences and pitch differences between audible presentation to eachof a user's ears to create the perception that the origin of the audiois at a particular location or in a particular direction in space. Theperceived distance to the perceived origin of the audio may be renderedby controlling the amount of reverberation and gain to indicatecloseness or distance from the perceived source of the spatial audio. Itwill be appreciated that spatial audio presentation as described hereinmay relate to the presentation of audio with only a perceived directiontowards its origin as well as the presentation of audio such that theorigin of the audio has a perceived location, e.g. including aperception of distance from the user.

Example FIG. 1 and example FIG. 2 shows an apparatus 100 that may beused to control or provide for the presentation of audio content to auser 200. The apparatus 100 may have application in the field oftelecommunication and therefore in the examples that follow, the audiocontent is referred to as telecommunication audio content. However, itwill be appreciated that the audio content need not be fortelecommunication. It will also be appreciated that reference totelecommunication audio content infers no limitation on how the audiocontent is delivered and may be via traditional telephone networks,mobile or cell phone based networks, data networks such as the Internetusing voice over IP or any telephony service whether packet based orcircuit switched. The apparatus 100 may include means to receivetelecommunication audio content or information about the receipt oftelecommunication audio content by another apparatus, such as an inputI. It will be appreciated that the apparatus 100 may include furtherinputs, such as to receive audio from other audio sources forpresentation to the user 200 or information about other audio that ispresented by another apparatus to the user 200. The input I may receiveaudio, such as immersive audio comprising primary and secondary audio,from a remote user device 202 (shown in FIG. 2). In one or moreexamples, the apparatus 100 may comprise part of a local user device 201which may have the capability to provide for telecommunication with theremote user device 202, via a telecommunication network 203. The localuser device 201 and/or remote user device 202 may comprise mobiletelephones or any other telecommunication equipment. Thus, the firstuser 200 may be able to communicate with a remote user 204 associatedwith the remote user device 202.

While the description that follows primarily describes the apparatus 100as part of the local user device 201, it will be appreciated that acorresponding remote apparatus (not shown but equivalent to apparatus100) may be provided as part of the remote user device 202 and mayperform equivalent functions on audio and information received from theparty who is remote thereto.

The apparatus 100 may comprise or be connected to a processor 108 and amemory 109 and may be configured to execute computer program code. Theapparatus 100 may have only one processor 108 and one memory 109 but itwill be appreciated that other embodiments may utilise more than oneprocessor and/or more than one memory (e.g. same or differentprocessor/memory types). Further, the apparatus 100 may be anApplication Specific Integrated Circuit (ASIC).

The processor may be a general-purpose processor dedicated toexecuting/processing information received from other components, such astelecommunication audio content in accordance with instructions storedin the form of computer program code in the memory. The outputsignalling generated by such operations of the processor is providedonwards to further components, such as to speakers, headphones, anamplifier or other audio presentation equipment (not shown) to presentaudio to the user 200.

The memory 109 (not necessarily a single memory unit) is a computerreadable medium (solid state memory in this example, but may be othertypes of memory such as a hard drive, ROM, RAM, Flash or the like) thatstores computer program code. This computer program code storesinstructions that are executable by the processor, when the program codeis run on the processor. The internal connections between the memory andthe processor can be understood, in one or more example embodiments, toprovide an active coupling between the processor and the memory to allowthe processor to access the computer program code stored on the memory.

In this example, the respective processors and memories are electricallyconnected to one another internally to allow for electricalcommunication between the respective components. In this example, thecomponents are all located proximate to one another so as to be formedtogether as an ASIC, in other words, so as to be integrated together asa single chip/circuit that can be installed into an electronic device.In some examples one or more or all of the components may be locatedseparately from one another.

Thus, the user 200 may be presented with audio from one or more audiosources (not shown), such as music from a music player, audio from awork presentation provided on a laptop computer and audio alerting theuser to the arrival of emails or messages, also from the laptopcomputer. The apparatus 100 may provide for presentation of theaforementioned audio from the audio sources or may control thepresentation of the audio from the audio sources or may receiveinformation about what audio from the audio sources is being presentedto the user 200 from audio presentation equipment. The apparatus 100 mayprovide for presentation of the telecommunication audio content orcontrol the presentation of the telecommunication audio content incombination with the audio from the one or more audio sources. In one ormore examples, the apparatus 100 may provide a user interface forcontrol of the presentation of the aforementioned audio and audiocontent from telecommunication.

FIG. 3 shows an example block diagram of the capture and encoding ofimmersive audio content by an immersive audio encoder. Block 301illustrates the capture of audio by one or more microphones from one ormore sources, such as from a mobile telephone, immersive video capturedevice, a computer or a smartphone. Block 302 illustrates the receivingof audio that may be captured in various formats, such asmonophonically, as ambisonics audio, as multiple channels or streams,wherein said audio may, in one or more examples, be associated withmetadata that may define at least the direction of arrival of the audiofrom the source or location of the source of audio. An encoder block 303receives the audio captured in its various formats. The encoder block303 may provide for audio mixing. The metadata associated with the audiomay be captured in a plurality of different formats. In one or moreexamples the block 303 may translate the metadata into a standardformat. In one or more examples, the audio itself may be captured andencoded in different formats and the block 303 may transcode the audioto a standard format or formats. Block 304 provides for generation of abitstream in an immersive audio encoded format. In one or more examples,the audio content referred to herein may be processed as describedabove.

Example FIG. 4 shows the user 200 in an audio presentation space 400comprising their living room. The living room 400 comprises a sofa 401,bookcase 402, big screen TV 403 and speakers 404, 405. The user 200 maybe experiencing audio content presented in an immersive way, such asspatial audio presentation. The audio content may be from one or moreaudio sources. The audio sources may provide audio in various formats,such as spatial audio, monophonic or stereophonic audio. Audiopresentation equipment (not shown, although headphones 406 may comprisepart) may present the audio as spatial audio from the one or moresources or the audio sources may be in communication to coordinate how,or from which perceived location, each of them presents their audiorelative to a reference point. The reference point is typically thelocation of the user. Accordingly, in one or more examples, audio of theone or more audio sources is to be perceived as originating from one ormore respective directions or ranges of directions around the user.

The proliferation of high-quality spatial audio services, applicationsand devices capable of rendering the spatial audio content (e.g.head-tracked binaural audio) will likely lead to significant increase intheir use. And vice versa, increased interest in immersive media willlead to more and more offerings in the market. With increased use, it islikely that the user 200 will be consuming spatial audio content (ormonophonic or stereophonic content presented as spatial audio) when theremote user 204 places a telecommunication call to them. Further, it islikely the user 200 will wish to multitask and thus utilize thecapabilities of spatial audio presentation in new and creative ways.

In this and one or more examples, the user 200 may be experiencingimmersive, spatial audio as described below. The user 200 may be workingon a computer using the living-room big screen TV 403 and may bepresented with spatial audio via head-tracked headphones 406. Theheadphones 406 may be provided with a microphone (not visible) to enableuser participation in a telecommunication call with the remote user 204,if such a telecommunication call is established. It will be appreciatedthat the microphone may be independent of the headphones 406. The user200 may have decided to receive audio related to what they are workingon, to receive audio associated with the arrival of social media updates(from a social media application audio source), music (from a musicplayer audio source) and telecommunication audio content (from atelecommunication device should a call be received or placed by theuser). It will be appreciated that the user 200 may not have to make adecision on where the audio of the audio sources is to be perceived andinstead the arrangement may be based on predefined preferences or rulesor where there is no overlap with other audio sources. In some examples,the user can define at least some of the perceived directions orlocations from which they perceive the audio via a user interface usedto control at least some aspects of the spatial audio rendering andpresentation. In one or more examples, the direction or location fromwhich the user perceives the audio may be defined relative to the roomor space in which the user is located. Accordingly, as the user rotatestheir head or moves around the room, the rendering of the audio may bemodified to account for their new orientation or position in the room tomaintain the perceived directions/locations in a fixed perceiveddirection/location.

FIGS. 5 and 6 illustrates how the user may have arranged a spatial audioscene 500 which defines where in space they perceive the audio from thevarious audio sources. As mentioned previously, the spatial audio scenemay be created automatically with or without user control. FIG. 5 showsthe same view as FIG. 4 and FIG. 6 shows a plan view of the same audiopresentation space 400.

The user 200 has placed their work-related audio such that, whenpresented, it will be perceived to originate from a forward position501. The spatial audio scene 500 is further configured such that themusic, when presented, will be perceived to originate from a front rightposition 502. The spatial audio scene 500 is further configured suchthat the audio of social media updates will be perceived to originatefrom a rearward position 503. The spatial audio scene 500 is furtherconfigured such that the audio of any incoming telecommunication audiocontent will be presented such that it will be perceived to originatefrom a front left position 504. Thus, a spatial audio scene may becreated in which audio from one or more audio sources is presented asspatial audio, by headphones 406 or speakers 404, 405 or a combinationthereof for example, to the user 200 such that the audio from each audiosource is perceived from a different direction or location relative tothe user 200. Accordingly, it can be understood that the perceivedorigin of audio of the audio sources has been virtually positionedaround the user 200.

When an incoming immersive audio telecommunication call is received, adefault configuration of the spatial audio scene 500 not forming part ofthe invention may be to replace the spatial audio presentation of theaudio of the audio sources with the incoming telecommunication audio.However, this may be annoying or inconvenient for the user 200 who iscurrently multitasking and has even set up a preferred arrangement ofthe audio sources in the spatial audio scene 500.

As will be described below, how telecommunication audio content ispresented in terms of presentation within the existing spatial audioscene 500 or replacing or modifying the presentation of other audio inthe spatial audio scene 500 of the user 200 may be based on informationreceived by the apparatus 100. Accordingly, the apparatus 100 mayreceive information about the importance of the secondary audio of thetelecommunication audio content. Further, the apparatus 100 may receiveinformation about the audio that is presented in the existing spatialaudio scene 500 (if any).

Thus, in one or more examples, the apparatus 100 may comprise meansconfigured to provide for presentation of incoming telecommunicationaudio content based at least in part on:

-   -   receipt of said audio content;    -   receipt of secondary audio importance information; and    -   receipt of current audio presentation information.

The apparatus 100 may receive telecommunication audio content which maybe from the remote user device 202 for presentation to the user 200,such as via their local user device 201. The telecommunication audiocontent may comprise the voice of the remote user 204 and ambient audioof the audible at the location of the remote user 204. Thetelecommunication audio content may thus be categorised as primary audioand secondary audio, as described above.

Accordingly, in one or more examples, the primary audio may be definedas comprising voice audio. Thus, the primary audio may comprise audiodetermined to be generated by a voice of the remote user 204 who is intelecommunication with the user 200. The secondary audio may be definedas comprising ambient audio. The ambient audio may thus comprise audioother than that determined to be generated by the voice of the remoteuser 204.

In one or more other examples, the primary audio may be defined ascomprising spatial audio content that includes directional informationsuch that, when presented to the user 200, it is perceived asoriginating from a particular direction or range of directions inaccordance with the directional information. In these one or moreexamples, it may be desirable to separate audio that is to be perceivedfrom a more specific direction or range of directions from more diffuseaudio that is to be perceived from a less specific direction or range ofdirections. In some examples, this may be achieved based on whether ornot directional information is associated with the audio and thus audiowith directional information is categorised as primary audio and audiowithout directional information is categorised as secondary audio, as itdeemed to make up the audio ambience. In other examples, it may bedesirable to categorise, as secondary audio, audio that has associateddirectional information but said associated directional informationindicates it should be presented such that it is perceived from a widerange of directions above a threshold range of directions. Accordingly,the threshold range may comprise 180° and therefore if the audio hasdirectional information indicative that it should be presented with aperceived direction of origin less than 180° it is deemed primary audioand if the audio has directional information indicative of a perceiveddirection equal to or greater than 180° it is deemed secondary audio. Itwill be appreciated that the threshold may comprise any desiredthreshold to separate primary audio from secondary audio.

The secondary audio importance information, in one or more examples, isassociated with said telecommunication audio content and indicative ofan importance of the secondary audio thereof. The importance may be aperceived importance set by the remote user 204. For examples, theremote user 204 may make the telecommunication call from their noisyoffice and deem the secondary, ambient, audio to be unimportant. Inother examples, the remote user 204 may be at a music concert and maywant the user 200 to hear their audio experience. Thus, the remote user204 may deem the secondary audio to be important and may indicate thisto their remote apparatus or remote user device 202 for it to providethe secondary audio importance information for receipt by the apparatus100. In one or more examples, the remote user 204 considering theambient audio as unimportant may place the call using a ‘voice call’application shortcut. In another example, the remote user 204considering the ambient audio as important may place the call using an‘immersive call’ application shortcut. In other words, for example theapplication icon on a smartphone user interface of the remote userdevice 202 may be different for the two cases of relative importance forthe secondary, ambient, audio. Any other suitable way of indicating theimportance of the ambience or secondary audio can also be used for theuser to make their selection. It is understood a ‘voice call’ here mayrefer to a telecommunication call where the secondary audio content mayor may not be present in the transmitted bitstream in a time-varyingmanner (for example based on the audio importance information and otherbit-rate determining factors such as network congestion), while an‘immersive call’ here refers to a telecommunication call where thesecondary audio content is preferably available for presentationthroughout the call.

In some examples, the remote apparatus or remote user device 202 forexample may analyse the expected or contextual importance of the currentenvironment either based on sensory data collected by the remoteapparatus or remote user device 202, received for example from the cloudor nearby terminals, or a combination thereof. The remote apparatus orremote user device 202 may be configured to, based on this information,adapt the user interface such that for example the ‘immersive call’application is shown first or otherwise offered as the preferred optionfor the user when the current environment or context is consideredimportant for providing the secondary audio or ambience. On the otherhand, when the secondary audio or ambience would be consideredunimportant by the analysis, the ‘voice call’ shortcut may be shownfirst or otherwise offered as the preferred or default option for theuser. In at least some examples, the network may provide furtherinformation for the device for making this adaptation. For example, thenetwork may indicate that the network is congested (and ‘voice call’should be preferred because an immersive call requires more bandwidth)or that there is a special pricing for an ‘immersive call’ in thecurrent cell or at the current time. According to the various examples,the remote user 204 may thusly select the importance of the “ambience”or the secondary audio via at least one means.

Thus, in one or more examples, said secondary audio importanceinformation is received from the remote user device 202 with saidtelecommunication audio content, which may be captured by the remoteuser device 202. In one or more examples, the secondary audio importanceinformation is set on a per telecommunication call basis. Thus, for eachtelecommunication call established, the remote user 204 may set thesecondary audio importance information for sending to the apparatus 100.In one or more other examples, the setting of the secondary audioimportance information may be automated based on predetermined criteriaor based on automated audio analysis. For example, if the remote user204 is at a location they frequent often, the secondary audio may beautomatically set to being unimportant. However, if the remote user 204is at an unusual location such as a safari park, the secondary audio maybe automatically set to being important for communicating as thesecondary audio importance information. In one or more examples, saidsecondary audio importance information is determined by and receivedfrom a server 205 (shown in FIG. 2 as part of network 203) that receivessaid telecommunication audio content from the remote user device 202 andmay forward it to the local user device 201 to provide saidtelecommunication.

In one or more examples, the current audio presentation information isindicative of at least whether or not audio from one or more audiosources is currently being presented to the user as spatial audio.Accordingly, the current audio presentation information may provide theapparatus 100 with information about the existing spatial audio scene500. This information may be whether or not any audio sources arecurrently being presented or it may comprise details of the locationsused for the spatial audio presentation of the audio of the audiosources. In one or more examples, the apparatus 100 may be configured topresent the audio of the audio sources and therefore the current audiopresentation information may be known to the apparatus 100.

The presentation of incoming telecommunication audio content maycomprise providing for presentation of the primary audio of thetelecommunication audio content and providing for presentation of thesecondary audio based on the secondary audio importance information andthe current audio presentation information. Thus, the primary audio maybe presented independent of, i.e. without consideration of, thesecondary audio importance information. The apparatus 100 may beconfigured to provide for presentation of the primary audio based on thecurrent audio presentation information. However, the apparatus 100 maybe configured to determine whether or not to and/or how (i.e. to beperceived from which direction) to present the secondary audio based onboth the secondary audio importance information and the current audiopresentation information.

In one or more examples, the apparatus 100 is configured to provide forpresentation of the primary audio as spatial audio such that it isperceived to originate from a direction or range of directions. In oneor more examples, said “range of directions” comprises one or both of(1) a “width” or “spatial extent” of a direction/location of theperceived origin of audio (i.e., the audio is not perceived asoriginating from a point-source) and (2) an area/sector where the audiois perceived from a point-source that is configured to move over saidrange of directions over time. From which direction the primary audio isperceived may be determined based on the current audio presentationinformation. For example, if the user 200 has designated a position orregion 504 from which to perceive telecommunication audio, then theapparatus 100 may be configured to present the primary audio of thetelecommunication audio as spatial audio such that it is perceived fromsaid predetermined position 504. In one or more other examples, where apredetermined position 504 for telecommunication audio has not beenpredetermined, then the apparatus 100 may be configured to identify aposition that is non-overlapping with a direction or range ofdirections, i.e. the directions from positions 501, 502 and 503 towardsthe user 200 for example, associated with the audio of the one or moreaudio sources based on the current audio presentation information. Thus,the apparatus 100 may be configured to identify an unused position inthe existing spatial audio scene 500 to use when presenting the primaryaudio as spatial audio to be perceived from said unused position. In oneor more other examples, the current audio presentation information maybe indicative of no audio currently being presented and the apparatus100 may be configured to present the primary audio of thetelecommunication audio content as spatial audio from a perceivedposition in front of the user 200 or in accordance with any otherarrangement defined by the directional information accompanying thespatial audio content of the telecommunication audio content.

The secondary audio importance information may define at least twolevels of importance comprising, for example, important and unimportant.It will be appreciated that more than two levels of importance may bedefined.

In one or more examples, the secondary audio importance information maycomprise recipient-adaptive information, wherein a remote user or remoteuser device 202 may be configured to, for example, receive informationabout the currently presented audio of the local user, and the secondaryaudio importance information may be based on said received information.Further, the secondary audio importance information may comprise adefinition of a proposed replacement of at least one currently presentedaudio source by the secondary audio content. Upon receiving suchproposal, the apparatus 100 may be configured to provide forpresentation of a user interface to receive user input from a local userwho may at least choose to accept or reject said proposal to modify thecurrent audio presentation.

In general, the apparatus may be configured to determine whether or notto present the secondary audio based on the secondary audio importanceinformation and the current audio presentation information.

Further, in one or more examples, the apparatus may be configured to, inthe event that the secondary audio should be presented, determine fromwhere the secondary audio, presented as spatial audio, is to beperceived as originating based on current audio presentationinformation.

The secondary audio, in one or more examples, may be considered to bethe ambient audio of the telecommunication call rather than the maincontent of the telecommunication call.

Accordingly, the apparatus 100 may be configured to, based on thesecondary audio importance information being indicative of the secondaryaudio being important (or of greater importance than another“importance” designation of the secondary audio importance information),provide for presentation of the secondary audio, said presentation basedon the current audio presentation information. Thus, based on thesecondary audio being considered or designated as important or desirablefor presentation, the apparatus 100 may deem it obligated to present it.The apparatus 100 may then consider the way in which it is presentedbased on the current audio presentation information. For example, fromwhich direction the secondary audio is heard may be determined based onthe current audio presentation information. If the user has designated aposition or region 504 from which to perceive telecommunication audio,then the apparatus 100 may be configured to present the secondary audioof the telecommunication audio as spatial audio such that it isperceived from said predetermined position 504. In one or more otherexamples, where a predetermined position 504 for telecommunication audiohas not been predetermined, then the apparatus 100 may be configured toidentify a position that is non-overlapping with a direction or range ofdirections, i.e. the directions from positions 501, 502 and 503 towardsthe user 200 for example, associated with the audio of the one or moreaudio sources based on the current audio presentation information. Inone or more examples, the secondary audio may be presented as monophonicaudio, and may therefore be presented without a perceived originlocation or direction. In one or more other examples, in which thecurrent audio presentation information is indicative of no audiocurrently being presented, the apparatus 100 may be configured topresent the secondary audio of the telecommunication audio content asspatial audio from perceived position(s) or directions all around theuser 200 in accordance with a spatial arrangement defined by thedirectional information accompanying the secondary audio of thetelecommunication audio content.

In one or more examples, in which the secondary audio importanceinformation is indicative of the secondary audio being important and thecurrent audio presentation information being indicative of at least oneof the one or more audio sources currently presenting audio to the user200, the apparatus 100 may be configured to provide for presentation ofthe secondary audio and provide for modification of one or both of avolume or the range of directions with which the audio of at least oneof the one or more audio sources is presented to accommodatepresentation of the secondary audio. Thus, the volume with which theaudio presented from perceived positions 501, 502, 503 may be reduced.In one or more examples, the size of the perceived positions 501, 502,503 may be reduced such that the audio of each audio source is perceivedfrom a narrower range of directions. In one or more examples, theposition of the perceived positions 501, 502, 503 may be moved such thatthere is a greater amount of space in the existing spatial audio scene500 for presentation of the secondary audio. It will be appreciated thatthe degree to which modification of one or more of the volume, perceivedposition of the audio source audio and perceived size of the position501, 502, 503 is provided may be based on predetermined criteria and/orcontent of the secondary audio. For example, if the secondary audiocomprises spatial audio content, then the directional information mayindicate a range of directions over which the secondary audio should bepresented to replicate the audio experience of the remote user 204.Accordingly, the apparatus 100 may be configured to modify the existingspatial audio scene using this range of directions.

In one or more examples, in which the secondary audio importanceinformation is indicative of the secondary audio being unimportant, theapparatus 100 may be configured to not present the secondary audio atall. In one or more other examples, the apparatus 100 may be configuredto provide for non-presentation of the secondary audio based on thecurrent audio presentation information being indicative that at leastone of the one or more audio sources are currently presenting audio tothe user 200. Thus, if other audio sources are being presented it may beoverly confusing for the non-important secondary audio to be presentedas well and, accordingly, it may be ignored. In one or more examples,the designation of the secondary audio as unimportant may provide fornon-sending of the secondary audio by the remote user device 202. Thus,the apparatus 100 may receive telecommunication audio content absent ofsecondary audio and comprising only primary audio.

In one or more other examples in which the secondary audio importanceinformation is indicative of the secondary audio being unimportant, theapparatus 100 may be configured to provide for presentation of thesecondary audio only when the current audio presentation information isindicative that none of the one or more audio sources are currentlypresenting audio to the user 200. Thus, if there is no current existingspatial audio scene 500, then the secondary audio may be presentedregardless of it being deemed unimportant in the secondary audioimportance information. In one or more examples, where the secondaryaudio importance information comprises three or more levels, a lowermostlevel of importance may always provide for non-presentation of thesecondary audio while a higher but not highest level of importance mayprovide for presentation of the secondary audio only in limitedcircumstances based on the current audio presentation information, suchas when the current audio presentation information is indicative thatnone of the one or more audio sources are currently presenting audio tothe user 200.

In one or more examples, the apparatus 100 may receive default perceivedlocation information which defines a default perceived location for theaudio content. Thus, the apparatus may provide for setting of aposition, such as the position 504, as a default perceived location forthe presentation of telecommunication audio content. The setting of saiddefault perceived location may be provided in the current audiopresentation information. It will be appreciated that in other examplesa separate designation of the default perceived location may be providedindependent of the current audio presentation information for use by theapparatus 100 in the presentation of the telecommunication audiocontent. In one or more examples, the apparatus 100 may be configuredsuch that on receipt of the telecommunication audio content, at leastsaid primary audio of said content is provided for presentation asspatial audio to be perceived as originating from said default perceivedlocation, if said default perceived location is set. In one or moreexamples, the apparatus 100 may be configured such that on receipt ofthe telecommunication audio content and if said default perceivedlocation is not set, at least said primary audio of said content isprovided for presentation as spatial audio such it is perceived from adirection or range of directions that is non-overlapping with any audiothat is presented from the one or more audio sources. Thus the directionis automatically determined to avoid perceived spatial overlap betweendifferent audio.

The apparatus 100 may therefore be advantageous because on receipt ofthe telecommunication audio content comprising primary and secondaryaudio, such as a proposed immersive audio call, the presentation of thetelecommunication audio content may be controlled by the presence of atleast the secondary audio importance information. The secondary audioimportance information may advantageously provide the remote user 204,the remote user device 202 or the server 205 with a means for signallingto the apparatus 100 regarding how important the secondary audio is andtherefore the apparatus 100 may make informed choices about how torender and/or present the incoming telecommunication audio content tothe user 200.

Above, the local user device 201 is described as receiving thetelecommunication audio content from the remote user device 202.However, as will be appreciated, to provide two-way communication, thelocal user device 201 may likewise be configured to capture audiocontent of the user 200 and their surroundings, termeduser-telecommunication audio content (or more generally “user audiocontent” to semantically distinguish it from the audio content describedabove). Thus, the apparatus 100 may be configured to send saiduser-telecommunication audio content to the remote user device 202 toprovide for telecommunication between the user 200 and the remote user204 of the remote user device 202. The apparatus may control atelecommunication device or transmitter to provide for the sending ofthe user-telecommunication audio content. In one or more examples, saiduser-telecommunication audio content comprises primary audio andsecondary audio, similar to the telecommunication audio content.

Further, the apparatus 100 may be configured to provide for generationand sending of secondary audio importance information for use by theremote apparatus of the remote user device 202, for example. As will beappreciated, the secondary audio importance information provided by theapparatus 100 may be associated with said user-telecommunication audiocontent and indicative of an importance of the secondary audio of theuser-telecommunication audio content.

The secondary audio importance information associated with theuser-telecommunication audio content sent to the remote user device 202may be based on user input received from the user 200. Thus, in one ormore examples, the user 200 may indicate, through a user input, wheninitiating and/or during a telecommunication call the importance oftheir secondary audio. In one or more other examples, the secondaryaudio importance information associated with the user-telecommunicationaudio content may be automatically determined by audio analysisperformed by the apparatus 100, the local user device 201 or the server205. The user 200 may or may not be required to confirm the automaticdetermination of the secondary audio importance information and thus theapparatus 100 may or may not be configured to receive confirmatory userinput. In one or more other examples, the secondary audio importanceinformation associated with the user-telecommunication audio content maybe automatically determined based on a current location of the user 200.For example, the current location of the user 200 may be compared to mapdata or historic locations at which the user 200 (or many other users)has been present to determine whether or not the current location isunusual or noteworthy and therefore the potential importance of thesecondary audio.

As described above, when the secondary audio importance information isindicative of the secondary audio being unimportant, the apparatus 100may be configured to not present it or present it as spatial audio froma particular location or without spatial audio presentation.Accordingly, the secondary audio may be provided over the network 203but then ultimately unused. Thus, in one or more examples, the apparatus100 may provide for modification of the user-telecommunication audiocontent to modify the secondary audio or remove the secondary audioprior to sending it to the remote user device 202 based on the secondaryaudio importance information associated with the user-telecommunicationcontent being indicative of the secondary audio being unimportant. Inone or more examples, only the primary audio may be provided for sendingwith or without associated directional information. In one or moreexamples, the apparatus 100 may be configured to provide for audiomodification of the user-telecommunication audio content from beingcategorised as primary audio and secondary audio to comprising one ofmonophonic and stereophonic audio. Thus, if the secondary audio is notimportant then the relevance of providing an immersive call may be lostand the telecommunication call may be “downgraded” to a monophonic orstereophonic audio call. Such monophonic or stereophonic audio call mayat least in some examples include at least some spatial information forthe primary audio content which may be the only transmitted audiocontent during the audio call.

How the secondary audio is presented or if the secondary audio ispresented at all may be selected by the apparatus 100 based at least onthe secondary audio importance information. In one or more examples, itmay be advantageous for the remote user 204 or the remote user device202 to know the selection made by the apparatus 100. Accordingly, in oneor more examples, the apparatus 100 may be configured to provide forsending of user-monitoring information to the remote user device 202,the user-monitoring information indicative of at least whether or notthe secondary audio of the telecommunication audio is being presented tothe user 200 for at least notifying said remote user 204 or remote userdevice.

Accordingly, based on the user-monitoring information, the remoteapparatus equivalent to the apparatus 100 associated with the remoteuser device 202 may be configured to provide for informing the remoteuser 204 of whether or not the secondary audio they are sending to theuser 200 is being presented to the user 200.

Likewise, based on user-monitoring information received by the apparatus100 from the remote user device 202 or remote apparatus, the apparatus100 may be configured to inform the user 200 whether or not thesecondary audio of the user-telecommunication audio content they aresending to the remote user 204 is being presented to the remote user204.

Informing the relevant users 200, 204 may comprise presentation of amessage, such as a textual or pictorial or aural or haptic message.

The user monitoring information may be considered as feedback to thesource of the telecommunication audio or user-telecommunication audio toprovide information about how it is being presented at its destination.

The user-monitoring information may be indicative about other parametersof the presentation of the (user-)telecommunication audio content.

In one or more examples, the user-monitoring information may beindicative of a presentation position 504 comprising a position relativeto the user 200 associated with the presentation of thetelecommunication audio content to the user 200 such that the user 200will perceive the telecommunication audio content to originate from saidpresentation position 504. Thus, the remote user device 202 or remoteapparatus thereof will be informed of where in the presentation space400 or spatial audio scene 500, the user 200 is currently perceiving theorigin of the telecommunication audio content.

The remote user device 202 or remote apparatus may be configured toprovide for presentation of the telecommunication audio content to theremote user 202 with an equivalent presentation position 504. Thus, theremote user's self-generated audio, i.e. the telecommunication audiocontent, will be captured and presented to the remote user 204 such thatit is perceived from a front left position equivalent to position 504relative to the user 200.

Likewise, the apparatus 100 may be configured to receive correspondinguser-monitoring information, termed remote-user-monitoring informationfor clarity, from the remote user device 202 or remote apparatusthereof. Accordingly, based on the remote-user-monitoring information,the apparatus 100 may be configured to provide for presentation of theuser-telecommunication audio content to the user 200 from a direction orlocation relative to the user 200 that corresponds to the presentationposition indicated in said remote-user-monitoring information.

With reference to example FIG. 7, the remote user 204 is shown at 701,which illustrates their audio environment. The telecommunication audiocontent comprises at least the voice of the remote user 204 as primaryaudio and at least two instances of ambient secondary audio receivedfrom locations 705 (comprising a cat) and 706 (comprising a dog).

The user 200 is shown at 702, which illustrates the audio environment ofthe user 200 and the spatial audio scene presented to the user 200. Thetelecommunication audio content from the remote user (primary andsecondary audio) happens to be presented such that it is perceived tooriginate from position 707. The user 200 is also listening to audio ofan audio source, which is presented such that it is perceived tooriginate from position 708. In this example, the audio of the audiosource is music, shown by a musical note. There are also some sources ofambient audio surrounding the user 200 at positions 709 (a child) and710 (a second child).

The user-monitoring information provided by the apparatus 100 mayinclude information indicative of the location 707 comprising thelocation from which the user 200 is to perceive the telecommunicationaudio content i.e. position 707.

The remote user 204 is shown at 703 having received said user-monitoringinformation. Accordingly, the remote apparatus of the remote user devicehas provided for presentation of the telecommunication audio content atposition 711, which corresponds to the position 707. The remote user 204thus knows where in the spatial audio scene 500 the user 200 isperceiving their telecommunication audio content. This may beadvantageous for understanding the remote party's audio scene. Thus, thedirection in which the audio content (e.g. primary audio and/orsecondary audio) is to be or being perceived by the user relative to areference direction (e.g the direction the user is facing) may beprovided to the remote user device as the user-monitoring information.The audio content can then be presented to the remote user by the remoteuser device acting on the user-monitoring information to give the remoteuser an understanding of how their audio content is being presented tothe user.

In one or more examples, the user-monitoring information may beindicative of audio of at least one of the one or more audio sourcespresented to the user as defined in the current audio presentationinformation for presenting to the remote user 204 by the remote userdevice 202 or an equivalent remote apparatus thereof. Thus, theuser-monitoring information may comprise an audio stream, sent to theremote user device 202, of the audio source audio listened to by theuser 200. In one or more examples, audio designated by the user 200 asbeing private will not be streamed. In one or more examples, theuser-monitoring information may be indicative of a reference, such as aURL or link, to at least one (i.e. music represented at 708) of the oneor more audio sources presented to the user 200.

The use of the user-monitoring information when it comprises audiosource audio or a reference thereto is shown at 704. The remote user 204is shown at 704 in an audio scene that again includes thetelecommunication audio content at position 711, which corresponds tothe position 707 and the ambient audio at 706 and 705. However, inaddition, based on the user-monitoring information, the remote apparatusmay provide for presentation of the music, shown by the musical note,listened to by the user 200 to the remote user 204, as shown at 712,which may correspond to position 708. In this example, the secondaryaudio of the user-telecommunication audio is presented at positions 713and 714 corresponding to positions 709 and 710. The child and secondchild are shown in boxes to illustrate that it is only their audio thatis present in the spatial audio scene of the remote user 204 rather thanthem being physically present with the remote user 204.

It will be appreciated that while the description of FIG. 7 relates tohow the remote user device 202 or remote apparatus thereof equivalent tothe apparatus 100 may provide for presentation of audio based on theuser-monitoring information, the apparatus 100 may perform equivalentactions based on user-monitoring information from the remote user device202, termed remote-user-monitoring information.

Thus, the apparatus 100, based on remote-user-monitoring informationreceived from the remote user device, may be configured to provide forpresentation of the user-telecommunication content to the user 200 suchthat it will be perceived from a position relative to the user 200 thatcorresponds to a position relative to the remote user 202 from which theremote user perceives the user-telecommunication content.

Accordingly, in this example, the remote-user-monitoring informationcomprises a presentation position comprising a position relative to theremote user 204 associated with the presentation of theuser-telecommunication audio content to the remote user 204 such thatthe remote user will perceive the user-telecommunication audio content(at least the primary audio thereof) to originate from said presentationposition.

In one or more examples, the apparatus 100, based onremote-user-monitoring information received from the remote user device202, may be configured to provide for presentation, to the user 200, ofthe audio of at least one of one or more audio sources currently beingpresented to the remote user 204. In this example, saidremote-user-monitoring information received from the remote user device202 comprises said audio or a reference to the at least one of the oneor more audio sources presented to the remote user 202.

FIG. 8 shows a pair of flow charts illustrating the telecommunicationcall from the point of view of the remote apparatus at 801 and theapparatus 100 at 802. It will be appreciated that typically saidtelecommunication call is two-way and therefore each apparatus 100 mayperform the actions in both flow charts based on at least in part theinformation and content received from the other.

Accordingly, flow chart 801 illustrates the remote user 204 initiatingan immersive telecommunication call at 803. The remote user 204 mayfurther provide user input to indicate the importance of the secondaryaudio 804. Based on the user input, the appropriate signalling, termedthe secondary audio importance information is generated and transmittedat 805. At 806, the remote apparatus may receive the user-monitoringinformation from the apparatus 100 of the user 200.

Flow chart 802 illustrates the user 200 having the audio of audiosources being presented. At step 807 the user 200 may set the currentaudio presentation information. At 808, the apparatus 100 may receivethe telecommunication audio content from the remote user along with thesecondary audio importance information sent at step 805. At step 809,the apparatus 100 may provide for rendering of the audio of the audiosources along with the telecommunication audio content based on thesecondary audio importance information and the current audiopresentation information and any optional default perceived location.

Accordingly, we also disclose an apparatus 100 configured to provideuser-telecommunication audio content to the remote user device 202 forpresentation to a remote user 204 via a local user device, saiduser-telecommunication audio content comprising audio of at least a userof the local user device 201, said telecommunication audio contentcomprising primary audio and secondary audio. The apparatus 100 may befurther configured to provide secondary audio importance informationassociated with said user-telecommunication audio content that isindicative of an importance of the secondary audio of saiduser-telecommunication audio content.

FIG. 9 shows a flow diagram illustrating the steps of, receiving 900audio content from a remote user device, the audio content comprisingprimary audio and secondary audio, the secondary audio being differentto the primary audio and comprising ambient audio;

-   -   receiving 901 secondary audio importance information associated        with said audio content and indicative of an importance of the        secondary audio;    -   receiving 902 current audio presentation information indicative        of at least whether audio from one or more audio sources is        currently being presented as spatial audio such that respective        audio of the one or more audio sources is to be perceived as        originating from one or more respective directions or ranges of        directions around a reference point;    -   providing for presentation 903 of the primary audio; and    -   providing for presentation 904 of the secondary audio based on        the secondary audio importance information and the current audio        presentation information.

FIG. 10 illustrates schematically a computer/processor readable medium1000 providing a program according to an example. In this example, thecomputer/processor readable medium is a disc such as a digital versatiledisc (DVD) or a compact disc (CD). In some examples, the computerreadable medium may be any medium that has been programmed in such a wayas to carry out an inventive function. The computer program code may bedistributed between the multiple memories of the same type, or multiplememories of a different type, such as ROM, RAM, flash, hard disk, solidstate, etc.

User inputs may be gestures which comprise one or more of a tap, aswipe, a slide, a press, a hold, a rotate gesture, a static hovergesture proximal to the user interface of the device, a moving hovergesture proximal to the device, bending at least part of the device,squeezing at least part of the device, a multi-finger gesture, tiltingthe device, or flipping a control device. Further the gestures may beany free space user gesture using the user's body, such as their arms,or a stylus or other element suitable for performing free space usergestures.

The apparatus shown in the above examples may be a portable electronicdevice, a laptop computer, a mobile phone, a Smartphone, a tabletcomputer, a personal digital assistant, a digital camera, a smartwatch,smart eyewear, a pen based computer, a non-portable electronic device, adesktop computer, a monitor, a smart TV, a server, a wearable apparatus,a virtual reality apparatus, or a module/circuitry for one or more ofthe same.

Any mentioned apparatus and/or other features of particular mentionedapparatus may be provided by apparatus arranged such that they becomeconfigured to carry out the desired operations only when enabled, e.g.switched on, or the like. In such cases, they may not necessarily havethe appropriate software loaded into the active memory in thenon-enabled (e.g. switched off state) and only load the appropriatesoftware in the enabled (e.g. on state). The apparatus may comprisehardware circuitry and/or firmware. The apparatus may comprise softwareloaded onto memory. Such software/computer programs may be recorded onthe same memory/processor/functional units and/or on one or morememories/processors/functional units.

In some examples, a particular mentioned apparatus may be pre-programmedwith the appropriate software to carry out desired operations, andwherein the appropriate software can be enabled for use by a userdownloading a “key”, for example, to unlock/enable the software and itsassociated functionality. Advantages associated with such examples caninclude a reduced requirement to download data when furtherfunctionality is required for a device, and this can be useful inexamples where a device is perceived to have sufficient capacity tostore such pre-programmed software for functionality that may not beenabled by a user.

Any mentioned apparatus/circuitry/elements/processor may have otherfunctions in addition to the mentioned functions, and that thesefunctions may be performed by the sameapparatus/circuitry/elements/processor. One or more disclosed aspectsmay encompass the electronic distribution of associated computerprograms and computer programs (which may be source/transport encoded)recorded on an appropriate carrier (e.g. memory, signal).

Any “computer” described herein can comprise a collection of one or moreindividual processors/processing elements that may or may not be locatedon the same circuit board, or the same region/position of a circuitboard or even the same device. In some examples one or more of anymentioned processors may be distributed over a plurality of devices. Thesame or different processor/processing elements may perform one or morefunctions described herein.

The term “signalling” may refer to one or more signals transmitted as aseries of transmitted and/or received electrical/optical signals. Theseries of signals may comprise one, two, three, four or even moreindividual signal components or distinct signals to make up saidsignalling. Some or all of these individual signals may betransmitted/received by wireless or wired communication simultaneously,in sequence, and/or such that they temporally overlap one another.

With reference to any discussion of any mentioned computer and/orprocessor and memory (e.g. including ROM, CD-ROM etc), these maycomprise a computer processor, Application Specific Integrated Circuit(ASIC), field-programmable gate array (FPGA), and/or other hardwarecomponents that have been programmed in such a way to carry out theinventive function.

The applicant hereby discloses in isolation each individual featuredescribed herein and any combination of two or more such features, tothe extent that such features or combinations are capable of beingcarried out based on the present specification as a whole, in the lightof the common general knowledge of a person skilled in the art,irrespective of whether such features or combinations of features solveany problems disclosed herein, and without limitation to the scope ofthe claims. The applicant indicates that the disclosed aspects/examplesmay consist of any such individual feature or combination of features.In view of the foregoing description it will be evident to a personskilled in the art that various modifications may be made within thescope of the disclosure.

While there have been shown and described and pointed out fundamentalnovel features as applied to examples thereof, it will be understoodthat various omissions and substitutions and changes in the form anddetails of the devices and methods described may be made by thoseskilled in the art without departing from the scope of the disclosure.For example, it is expressly intended that all combinations of thoseelements and/or method steps which perform substantially the samefunction in substantially the same way to achieve the same results arewithin the scope of the disclosure. Moreover, it should be recognizedthat structures and/or elements and/or method steps shown and/ordescribed in connection with any disclosed form or examples may beincorporated in any other disclosed or described or suggested form orexample as a general matter of design choice. Furthermore, in the claimsmeans-plus-function clauses are intended to cover the structuresdescribed herein as performing the recited function and not onlystructural equivalents, but also equivalent structures. Thus, although anail and a screw may not be structural equivalents in that a nailemploys a cylindrical surface to secure wooden parts together, whereas ascrew employs a helical surface, in the environment of fastening woodenparts, a nail and a screw may be equivalent structures.

1. An apparatus comprising: at least one processor; and at least onememory including computer program code, the at least one memory and thecomputer program code configured to, with the at least one processor,cause the apparatus to perform at least the following: receive audiocontent from a remote user device, the audio content comprising primaryaudio and secondary audio, the secondary audio being different to theprimary audio; receive secondary audio importance information associatedwith said audio content and indicative of an importance of the secondaryaudio; receive current audio presentation information indicative of atleast whether audio from one or more audio sources is currently beingpresented as spatial audio such that respective audio of the one or moreaudio sources is to be perceived as originating from one or morerespective directions or ranges of directions around a reference point;provide for presentation of the primary audio; and provide forpresentation of the secondary audio based on the secondary audioimportance information and the current audio presentation information.2. The apparatus according to claim 1, wherein the secondary audioimportance information defines at least two levels of importancecomprising important and unimportant and wherein the apparatus isfurther configured to: provide for presentation of the secondary audiobased on the secondary audio importance information being indicative ofthe secondary audio being important, said presentation based on thecurrent audio presentation information; and at least one of: provide forpresentation of the secondary audio based on the secondary audioimportance information being indicative of the secondary audio beingunimportant and the current audio presentation information beingindicative that none of the one or more audio sources are currentlypresenting audio; or provide for non-presentation of the secondary audiobased on the secondary audio importance information being indicative ofthe secondary audio being unimportant and the current audio presentationinformation being indicative that at least one of the one or more audiosources are currently presenting audio.
 3. The apparatus according toclaim 2, wherein the apparatus is further configured to: provide forpresentation of the secondary audio, based on the secondary audioimportance information being indicative of the secondary audio beingimportant and the current audio presentation information beingindicative of at least one of the one or more audio sources currentlypresenting audio, and provide for modification of one or both of avolume or the range of directions with which the audio of at least oneof the one or more audio sources is presented to accommodatepresentation of the secondary audio.
 4. The apparatus according to claim2, wherein the apparatus is further configured to: provide forpresentation of the secondary audio such that it is perceived asoriginating from all directions around the user, based on the secondaryaudio importance information being indicative of the secondary audiobeing important and the current audio presentation information beingindicative that none of the one or more audio sources are currentlypresenting audio.
 5. The apparatus according to claim 1, wherein theapparatus is further configured to: provide for presentation of theprimary audio as spatial audio such that it is to be perceived asoriginating from a direction or range of directions that isnon-overlapping with a direction or range of directions associated withthe audio of the one or more audio sources based on the current audiopresentation information.
 6. The apparatus according to claim 1, whereinthe apparatus is further configured to: receive default perceivedlocation information which defines a default perceived location for theaudio content; and provide for presentation of at least said primaryaudio of said audio content as spatial audio to be perceived asoriginating from said default perceived location.
 7. The apparatusaccording to claim 1, wherein the apparatus is further configured to:capture user audio content of a user; send said captured user audiocontent to the remote user device to provide for telecommunicationbetween the user and the remote user of the remote user device, whereinsaid user audio content comprises primary audio and secondary audio, thesecondary audio being different to the primary audio; and send secondaryaudio importance information associated with said user audio content andindicative of an importance of the secondary audio of the user audiocontent for use by the remote user device, the secondary audioimportance information based on user input received from the user. 8.The apparatus according to claim 7, wherein the secondary audioimportance information is based on one or more of: audio analysis of theuser audio content; or a determined current location of the user.
 9. Theapparatus according to claim 7, wherein the apparatus is furtherconfigured to, on determination that the secondary audio importanceinformation associated with said captured user audio content isindicative of the user audio content being unimportant, modify thecaptured user audio content from being categorised as primary audio andsecondary audio to one of monophonic or stereophonic audio prior to saidsending of the captured user audio content or capture the user audiocontent as one of monophonic and stereophonic.
 10. The apparatusaccording to claim 1, wherein the apparatus is further configured to:send user-monitoring information to the remote user device, theuser-monitoring information indicative of whether or not the secondaryaudio is being presented for at least notifying said remote user orremote user device.
 11. The apparatus according to claim 1, wherein theapparatus is further configured to: send user-monitoring information tothe remote user device, the user-monitoring information indicative ofone or more of: a presentation direction comprising a direction definedrelative to a reference direction comprising from where the audiocontent is to be perceived when the apparatus is in use, thepresentation direction for use in presenting the audio content to theremote user by the remote user device; audio of at least one of the oneor more audio sources presented to the user as defined in the currentaudio presentation information for presenting to the remote user by theremote user device; or a reference to at least one of the one or moreaudio sources presented to the user as defined in the current audiopresentation information from which the audio can be retrieved forpresenting said audio of the at least one of the one or more audiosources to the remote user by the remote user device.
 12. The apparatusaccording to claim 7, wherein the apparatus is further configured to:receive remote-user-monitoring information from the remote user device,and one or more of: provide for presentation of the user audio contentto be perceived as originating from a position corresponding to apresentation position, wherein the remote-user-monitoring informationfrom the remote user device comprises the presentation position which isindicative of a position relative to the remote user from where the useraudio content is to be perceived when presented by the remote userdevice; or provide for presentation of the audio of at least one of oneor more audio sources currently being presented to the remote user,wherein said remote-user-monitoring information received from the remoteuser device comprises said audio or a reference to the at least one ofthe one or more audio sources presented to the remote user.
 13. Amethod, the method comprising: receiving audio content from a remoteuser device, the audio content comprising primary audio and secondaryaudio, the secondary audio being different to the primary audio;receiving secondary audio importance information associated with saidaudio content and indicative of an importance of the secondary audio;receiving current audio presentation information indicative of at leastwhether audio from one or more audio sources is currently beingpresented as spatial audio such that respective audio of the one or moreaudio sources is to be perceived as originating from one or morerespective directions or ranges of directions around a reference point;providing for presentation of the primary audio; and providing forpresentation of the secondary audio based on the secondary audioimportance information and the current audio presentation information.14. The method according to claim 13, wherein the secondary audioimportance information defines at least two levels of importancecomprising important and unimportant and wherein the method furthercomprising: providing for presentation of the secondary audio based onthe secondary audio importance information being indicative of thesecondary audio being important, said presentation based on the currentaudio presentation information; and at least one of: providing forpresentation of the secondary audio based on the secondary audioimportance information being indicative of the secondary audio beingunimportant and the current audio presentation information beingindicative that none of the one or more audio sources are currentlypresenting audio; or providing for non-presentation of the secondaryaudio based on the secondary audio importance information beingindicative of the secondary audio being unimportant and the currentaudio presentation information being indicative that at least one of theone or more audio sources are currently presenting audio.
 15. The methodaccording to claim 14, further comprising: providing for presentation ofthe secondary audio, based on the secondary audio importance informationbeing indicative of the secondary audio being important and the currentaudio presentation information being indicative of at least one of theone or more audio sources currently presenting audio, and providing formodification of one or both of a volume or the range of directions withwhich the audio of at least one of the one or more audio sources ispresented to accommodate presentation of the secondary audio.
 16. Themethod according to claim 14, further comprising: providing forpresentation of the secondary audio such that it is perceived asoriginating from all directions around the user, based on the secondaryaudio importance information being indicative of the secondary audiobeing important and the current audio presentation information beingindicative that none of the one or more audio sources are currentlypresenting audio.
 17. The method according to claim 13, furthercomprising: providing for presentation of the primary audio as spatialaudio such that it is to be perceived as originating from a direction orrange of directions that is non-overlapping with a direction or range ofdirections associated with the audio of the one or more audio sourcesbased on the current audio presentation information.
 18. The methodaccording to claim 13, further comprising: receiving default perceivedlocation information which defines a default perceived location for theaudio content; and providing for presentation of at least said primaryaudio of said audio content as spatial audio to be perceived asoriginating from said default perceived location.
 19. The methodaccording to claim 1, further comprising: capturing user audio contentof a user; sending said captured user audio content to the remote userdevice to provide for telecommunication between the user and the remoteuser of the remote user device, wherein said user audio contentcomprises primary audio and secondary audio, the secondary audio beingdifferent to the primary audio; and sending secondary audio importanceinformation associated with said user audio content and indicative of animportance of the secondary audio of the user audio content for use bythe remote user device, the secondary audio importance information basedon user input received from the user.
 20. A non-transitory computerreadable medium comprising program instructions stored thereon forperforming at least the following: receiving audio content from a remoteuser device, the audio content comprising primary audio and secondaryaudio, the secondary audio being different to the primary audio;receiving secondary audio importance information associated with saidaudio content and indicative of an importance of the secondary audio;receiving current audio presentation information indicative of at leastwhether audio from one or more audio sources is currently beingpresented as spatial audio such that respective audio of the one or moreaudio sources is to be perceived as originating from one or morerespective directions or ranges of directions around a reference point;providing for presentation of the primary audio; and providing forpresentation of the secondary audio based on the secondary audioimportance information and the current audio presentation information.